perm filename CHAOS.NET[DLN,MRC] blob
sn#311218 filedate 1977-10-20 generic text, type C, neo UTF8
COMMENT ā VALID 00021 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 CHAOS ORDER
C00005 00003 >Goals
C00010 00004 >User appearance
C00019 00005 >Network Control Program
C00025 00006 >>Hosts attached to more than one subnet
C00029 00007 >>Packet numbers
C00031 00008 >>Indices
C00037 00009 >>Operations
C00054 00010 >>Table of packet field use vs opcode
C00056 00011 >>Flow and Error Control
C00070 00012 >>Dynamic Window-size Adjustment
C00085 00013 >>Uniquization
C00089 00014 >>Media handlers
C00094 00015 >>Buffers
C00100 00016 >>ITS System Calls
C00114 00017 ---------- Material after this point may be inoperative ----------
C00131 00018 >Transmission Media
C00143 00019 >>DL10 & DTE20
C00144 00020 >>Asynchronous line
C00146 00021 >Higher-Level Protocols
C00147 ENDMK
Cā;
CHAOS ORDER
**** DRAFT ****
NOTES:
Work more on dynamic flow control, see end of that section. Data grams
seem to have a bug that the two parties can never agree on whether they
both agree that it really happened. Am I losing? Flush cruft at end?
Delete NOP since WIN is just as good and keeps window size more consistent
in the face of lost packets.
-------
Goals
Non-goals
Hardware Assumptions and Constraints
User appearance
Connections
Contact Names
ITS implementation
Lisp Machine implementation
Network Control Program
Packets
Hosts attached to more than one subnet
Subnet and Host numbers
Packet numbers
Indices
Routing
Operations
Table of packet field use vs opcode
Flow and Error Control
Dynamic Window-size Adjustment
Uniquization
Media Handlers
Buffers
ITS System Calls
Comparison with LCSNET
Principle differences
High priority data packets, interrupts, and flushing
Data grams
Multiple messages per packet
Checksums in the packet
Host-independent user-level protocols
Very Small Hosts
Transmission Media
Ethernet
TEN11 Interface
DL10 & DTE20
Asynchronous line
Higher-Level Protocols
Telnet
File Access
Mail
Locate-named-service
>Goals
High speed communication between processes running in various local
machines. By "high speed", I mean much faster than the Arpanet. At least
comparable to TU20 magtape (30000 characters/second), about 10 times the
speed of the Arpanet.
No undetected errors in data transmission.
Not to depend on a particular medium. (However, we are compromising by
picking a fixed packet size. The simplicity and efficiency are worth it.)
Simple enough to put in small pdp11's. Also, simple to the user.
As much power as the Arpanet but, hopefully, a lot less hair.
Work well for both "telnet" and "file transfer."
The initial implementation in ITS should have the "in-system" part as
small and simple as possible. This includes no per-host tables in the
NCP, which allows large nets and easy reconfiguration. There is, of
course, a host-name to address translation table used by "user" programs.
The NCP has to have a per-subnet table which remembers where the bridge(s)
to that subnet from the local subnet are.
>Non-goals
Byte sizes other than 8 bits. (pdp10 binary transmission should be part
of a user-level file-transfer/ML-device protocol.)
Compatibility with the Arpanet.
Substituting for TEN11 interface functions such as running the AI TV11 and
XGP.
Automatic routing will be deferred. Initially the routing tables will be
assembled into the programs. A host needs one routing table for each
subnet it is connected to (or one for each network interface it
possesses.)
>Hardware Assumptions and Constraints
Transmission is physically in "packets" which have headers, rather than
in, e.g., continuous streams.
The prototype caiosnet (ether) interface limits the physical length of a
packet to 1025 bits including overhead bits. The net result is the
maximum number of data bytes in any packet is 104. This limitation will
be extended to the whole network (to keep things simple). However some
provision will be made for a possible network-wide packet-size expansion.
All transmission media will be assumed to be highly-reliable but not
perfect; "perfect" reliability will be assured by having the two ends of a
connection use an acknowledgement protocol which detects lost messages.
Transmission media are required to lose any messages that they don't
deliver intact. (I.e. there must be hardware checksums.)
The acknowledgement protocol must be designed not to limit performance.
Statistical flow control ((see below.))
>User appearance
The network allows user processes in various machines to communicate with
each other in various ways, for instance, in imitation of a terminal, or
in imitation of a disk file system. These facilities are built on top of
the basic capability to send "packets" (a header plus some data in the
form of 8-bit bytes) through the network. The network undertakes never to
lose or garble any packets, except when the connection is cut off
entirely.
This document defines the low-level, "in-system" part of the protocol. On
top of this, special programs (running in user-mode) will implement the
higher-level protocol that the general user program sees. These protocols
and programs won't be discussed further in this document, but remember
that the strange packet formats and so forth are not seen by most user
programs.
>>Connections
When two processes wish to communicate, they establish a connection
between them. This connection allows two streams of packets to flow, one
in each direction. [Explain why connections should be bi-directional
rather than uni-directional. Basically that's what you always want, and
it makes things simpler.]
Connections are essentially the only facility provided by the network.
However, when first establishing the connection it is necessary for the
two processes to contact each other, and make each other known to their
respective operating systems. In addition, it is often the case (in the
usual user-server situation) that one of the processes does not exist
beforehand, but is to be created and made to run a specified program.
>>Contact Names
The way we choose to implement contacting is to say that one process is
always a "user" and one process is always a "server". The server has some
"contact name" to which it "listens". The user requests its operating
system to connect it to a specified contact name at a specified host. If
a process at that host is listening to that contact name, the two are
connected. If no one is listening to that contact name, the operating
system must create a server process which will load itself with the
appropriate program and connect up.
Discovering which host to connect to to obtain a given service is an issue
for higher-level protocols. It will not be dealt with at all initially
(that is, there will be a table of host names and numbers and the user
will have to enter the name.)
Once the connection has been established, there is no more need for the
contact name, and it is discarded. Indeed, often the contact name is
simply the name of a network protocol (such as "telnet") and several users
may want to have connections to that service at the same time, so contact
names must be "reusable." (In the other common case, the contact name
will be a "gensym".)
As far as the operating systems involved are concerned, contact names are
simply arbitrary ascii strings defined by user programs. It is expected
that the various higher-level protocols will define standard contact
names; for instance, to get the telnet protocol one would connect to
"telnet"; to get the file transfer protocol one would connect to
"file-transfer". If a machine receives a request to connect to a contact
name which no one is currently listening to, a server process must be
created and made to execute a program which decides, from the contact
name, what server program to load and execute, or else to refuse the
request for connection.
Contact names have no relation to file names; they are simply a device for
introducing two processes to each other. If one was using the network to
transfer a file, one would first contact the file transfer server at the
appropriate host, then send a packet containing the name of the file to be
accessed.
>>ITS system calls
Ordinary user programs will not access the network directly; they will go
indirectly through a job-device or sty-type program which will use a
higher-level protocol to make the network look like what the user wants,
the traditional things being a terminal and a disk file system.
Since these intermediate user-mode programs for using the network will
exist, there is no reason for the interface to the low level network
provided by the system to look at all like a standard device. Instead, it
will be designed solely for simplicity and ease of implementation, and for
a certain degree of efficiency. This interface will be described after
the interface between Network Control Programs in different machines (the
low-level protocol) is described.
At some future time the intermediate programs might get moved into the
system for reasons of efficiency, but that should not be allowed to
complicate the initial implementation.
There will be a .INSRT-able file of routines similar to NETWRK.
>>Lisp Machine implementation
In the case of the Lisp Machine, the only distinction between user
programs and system programs is who maintains and documents them, and how
carefully. The code will be modularized in about the same way as in the
ITS implementation.
>Network Control Program
This is the part of the operating system(s) that implements the network
(obviously).
>>Packets
The NCP's operate by exchanging packets. A packet consists of a header
containing control information, and zero or more 8-bit bytes of data.
Hardware restrictions of the prototype CAIOS net interface restrict the
maximum length of a packet to 61 16-bit words. In fact, we will limit it
to 60 words (to make packet buffers in pdp10's be 32 words including two
overhead words). Again for the convenience of pdp10's, the header should
be an even number of 16-bit words.
In this section the packets will be described as they look to a pdp11.
They look the same inside a Lisp Machine, since the byte structure is the
same. Inside a pdp10, packets are stored with two 16-bit words
left-adjusted in each pdp10 word. Additionally, the bytes in the data
portion of the packet are swapped so as to put them in pdp10 standard
order. pdp11's that act as network interfaces for pdp10's will be
required to do this byte swapping since they're likely to have more time
available than the 10 to do it in, and can also do it faster, having a
special instruction for it. pdp10's that communicate directly to the
network will have hardware assistance for byte reshuffling in their
interfaces. See the transmission media section for how packets are
encapsulated during transmission through the various media.
The header is 8 16-bit words and contains the following fields:
-----------------------
|opcode(8) | unused(8)|
-----------------------
|fc(4) | nbytes(12) |
-----------------------
| destination host # |
-----------------------
| destination index |
-----------------------
| source host # |
-----------------------
| source index |
-----------------------
| packet # |
-----------------------
| ack packet # |
-----------------------
opcode - tells the receiver of the packet how to interpret it. See the
Operations section below. This is 8 bits long. The 128 opcodes with high
order bit =0 are for NCP use. The 128 with high order bit =1 are for
user-program use.
unused 8 bits reserved for future use.
fc - forwarding count. 4 bits which count the number of times this packet
has been forwarded by bridges. Initially this field is always generated
as zero. Each bridge increments it; if it overflows, there is assumed to
be a loop and the packet is discarded.
nbytes - the number of 8-bit bytes of data in the data part. The maximum
value of nbytes is 104. The minimum is 0. This is 12 bits long to allow
for 4K-bit packets. (Actually 12 bits is enough for up to 32K-bit
packets.)
destination host # This is divided into two 8-bit fields. The high byte
specifies which subnet. The low byte specifies which host on that subnet,
and (on ethernet subnets) is identical to the hardware host number.
Neither field may be zero in a valid host number.
destination index - index for this connection assigned by the destination
host's NCP.
source host # - see destination host #.
source index - index for this connection assigned by the source host's
NCP.
packet # - an ascending reference number used in error and flow control
(see below).
ack packet # - used in error and flow control (see below.)
>>Hosts attached to more than one subnet
(This also applies to hosts with more than one interface to the same
subnet, if there ever are any.)
Such hosts ought to act as bridges. That is, if a packet is received
which is not addressed to this host, it should be sent back out, using
this host's routing tables. The forwarding count should be used to
prevent infinite loops in the event of inconsistent forwarding tables in
two bridges.
It is undesirable for a host to have more than one number. So a host
connected to multiple subnets should choose one subnet as its "home",
which is the address which is advertised as that host. The host's other
network connections are un-named bridges. In some causes it may be
preferable not to pick a "home" subnet; instead, one invents a new private
subnet which only has that one host on it, and all the host's network
connections act as bridges to that subnet (also to each other).
The routing should be set up so that packets destined for such a host from
a subnet on which it has an interface choose that interface as the bridge
to that host, so that in fact data flows the same way as if the host had
more than one number and the host-name to host-number lookup magically
chose the right number.
>>Subnet and host numbers.
Each machine has an identifying number. Since these are totally
arbitrary, I will assign some right now to avoid any confusion.
Subnet numbers
0 not used.
1 9th floor subnet
2 subnet that bridges between bldg 38 and 9th floor
Host numbers. These are probably wrong, since the
host number will be the same as the physical ethernet
address, which depends on the relative position
on the cable.
0 not used.
1 AI
2 ML
3 DM
4 MC
5 Micro automation
6 LOGO
7 Plasma physics
10 Chess machine
11 Lisp machine #1
12 Lisp machine #2
>>Packet numbers
Each time the sending user puts another packet into the network, this
number is increased by one. (These numbers are independent for the two
directions of a connection.) The receiver uses these numbers to get the
packets into order and ensure that there are no duplications nor
omissions. The packet numbers are 16 bits and wrap around to zero when
they overflow. When the connection is first opened, an initial value for
the packet# is established. If it was 0, then the packet# of the first
data packet would be 1.
The receiver returns packet numbers to the sender in the "ack packet #"
field of packets going in the opposite direction on the same connection.
The number in this field is the number of the latest packet successfully
received by the receiver. The sender need not make any further attempt to
send packets numbered less or equal to this. More on this below.
Packet #'s should be compared modulo 2**16. On pdp11's, use
CMP A,B
BMI <A is less> (BMI rather than BLT or BLO)
On pdp10's, use
SUB A,B
TRNE A,100000 (rather than CAMGE A,B)
JRST <A is less>
On Lisp machines, use
(AND (BITTEST 100000 (- A B))
<A is less>) [rather than (AND (< A B) ...)]
>>Indices
Each connection has two indices assigned to it, one at each end. Each
index is an arbitrary 16-bit number assigned by the NCP at its end;
usually it is an index into that NCP's tables. Indices are required to be
non-zero. For maximum simplicity, all packets include both indices. The
receiver of a packet uses the destination index to find out who to give
the packet to. Generally the source index is used only for error
checking, but when a connection is first opened the source index has to be
saved and used as the destination index in future packets in the reverse
direction.
To prevent packets somehow left over from old connections from interfering
with new connections, we require that a certain time elapse between
successive connections between the same pair of hosts with the same index
numbers at each end, this time to be longer than the maximum time a packet
is reasonably expected to sit around in the network somewhere (the maximum
transit time through all bridges, etc.) This requirement is implemented
by making part of the index number be a "uniquizer"; when a host reuses a
slot in its tables, it increments the uniquizer, so that the index number
for that slot is not the same as it was previously. Then if the uniquizer
field is sufficiently wide, and the rate of creation of connections
(actually the rate of allocation of indices) is sufficiently low, the
requirement will be satisfied. For the kind of network we're talking
about, the time is several seconds, and the uniquizer need only be a few
bits wide. It is up to each host how wide it makes the uniquizer,
depending on how big it wants its tables to be. It is best if table slots
are also allocated circularly, so the same one is not reused immediately.
A user process's "capability" or "channel" to a connection, used by it to
ask the NCP to operate on that connection, simply contains the appropriate
index.
Associated with each index the NCP has a "state", the host # and index #
of the other end of the connection, some read buffers and associated
variables, including a current packet #, and some write buffers and
associated variables, again including a current packet #.
The "state" can be Closed (no connection or other activity currently
associated with this index), Open (this index has a connection to another
index at another machine), RFC-sent (requested another machine for a
connection, but no answer yet), Listen (listening for a request for
connection to a certain contact name), Broken (connection closed
abnormally by network or machine lossage), and RFC-received (waiting for a
server process to get going and pick up a request for connection that came
in).
>>Routing
This section is a place-holder. Initially routing will be kludged with a
fixed table. Once the network is set up automatic routing will be put in.
The routing decision consists of looking at the destination subnet field
of a packet and deciding which interface (on a multi-interface host) to
transmit it on, i.e. what is the best route to that subnet. In addition,
if the destination is not on a subnet to which there is a direct
interface, one must determine what is the host number of the bridge it
should be sent to.
It also involves recognizing packets which are destined to the current
host and receiving them.
>>Operations
This section tells what the values of the opcode field in a packet are,
and how an NCP responds to each one.
1 RFC - Request for Connection
This message is sent from user to server in order to open a connection.
The data contains the contact name. Actually, if the data contains a
space, the characters before the space are the contact name and the
characters after the space are "arguments". This is done so that simple
transactions can work (see below).
The destination index is zero, because it is not known yet. The responses
are:
OPN, if a server process is found or created that wishes to accept the
request and open up a connection.
CLS, if the connection cannot be opened. The data field contains an ascii
explanation of why not.
ANS, if a simple transaction is occuring. The data contains the answer,
and no connection is ever created.
FWD, if there is no server process at this host, but there might be a
suitable one some place else.
There may also be no response, if the RFC was lost in the network, or the
destination host is down, or the reply was lost in the network. The
network software guarantees reliable transmission once a connection has
been established, but not reliable transmission of the control messages
that initially establish a connection, so appropriate time-outs must exist
(perhaps only in the user process that asks the NCP to issue an RFC).
The packet # field contains the first packet # that will be assigned to
data transmitted from the user process, minus one modulo 2**16. In the
simplest case, this can be zero, and the first packet sent will be packet
# 1. One might also imagine uniquizing the packet numbers as an extra
error check, but this should not be necessary, because the indices are
uniquized, and connections must be explicitly agreed to by each end before
data can flow.
The ack packet # field contains the initial window size for packets sent
to the sender of the RFC (see below). This field is used because there
can be no acknowledgement going on, since there is no connection yet.
2 OPN - Connection Open
This is the positive acknowledgement to RFC. The source index field
conveys the acknowledger's connection index to the requester. The packet
# field contains the first packet # that will be assigned to data
transmitted from the server process, minus one modulo 2**16.
The ack packet # field contains the initial window size for packets sent
to the sender of the OPN (see below). This field is used because there
can be no acknowledgement going on, since there is no connection yet.
3 CLS - Connection Closed
CLS is the negative response to RFC. It indicates that no server was
listening to the contact name, and one couldn't be created, or for some
reason the server didn't feel like accepting this request for a
connection, or the destination NCP was unable to complete the connection
(e.g. connection table full.) The destination index will be the source
index of the RFC. The source index will be zero because the NCP did not
put this connection into its tables. The data bytes, if there are any,
contain an ascii explanation.
CLS is also used to close a connection after it has been open for a while.
In the Arpanet, the NCP undertakes not to close the connection when the
user requests it, but waits until all data transfer has completed. This
is a source of extra complexity, since data transfer may be hung up, there
have to be timeouts, there have to be connections waiting to be closed
which aren't owned by any user, etc. It seems simpler to make CLS take
effect immediately, and let the user processes assure that data transfer
has been completed. Note that telnet-like applications don't need it, and
ftp-like applications have to have it separately from closing anyway.
4 FWD - forward a request for connection
This is a response to RFC which indicates that the desired service is not
available at the process contacted, but may be available at a
possibly-different contact name at a possibly-different host. The data
field contains the new contact name and the ack packet # field contains
the new host number. The issuer of the RFC should issue another RFC to
that address.
5 ANS - answer to a simple transaction
Simple transactions are transactions which consist of a single question
(request) and a single answer (response). They have no side effects, so
it is not necessary for either party to know whether the other party
thinks the transaction was completed or not. Simple transactions need no
flow control and no error control, other than a timeout by the user side
to detect lost messages. They are a simple, efficient way for doing
simple-minded things such as extracting information (such as the time or a
host name table) from a central server.
A simple transaction is initiated by sending an RFC to a server which
happens to use simple transactions rather than full connections. The data
field of the RFC consists of the contact name of the server, optionally
followed by a space and arguments to the request. The server responds by
sending an ANS packet, whose data field is the answer. The destination
address of the ANS comes from the source address of the RFC. The source
address, packet #, and ack packet # fields of the ANS are not used and
should be zero.
The difference bwteen simple transactions (2 packets) and full datagrams
(4 packets) is that in the simple transaction the two parties don't have
to agree about whether or not the transaction in fact took place, while in
the full datagram they do, so acknowledgement is necessary.
200-377 DATA - Transmits Data
The data portion of the packet is data being sent through the connection.
The packet # is a number that increments by one for each data packet sent
in this direction on this connection. This is used to detect lost packets
(which includes packets garbled in transmission and packets lost in the
statistical flow control scheme) and duplicated packets (caused by lost or
delayed acknowledges. The NCP undertakes to deliver the packets to the
destination process in the same order that they came from the source
process, with no duplications and no omissions. Note that any opcode with
the sign bit on is a data packet as far as the NCP is concerned; if they
wish, higher-level protocols may use the opcode field to define various
different kinds of data packets. Thus, what is herein called a data
packet may be a "control" packet to a higher-level protocol.
6 NOP - no-operation
This type of packet does nothing special to itself. However, like all
packets it contains an "acknowledge packet number" field. The main use
for NOP packets, then, is as a vehicle to convey acknowledgements. The
packet# field of NOP is not used (i.e. NOPs are not themselves
acknowledged.)
7 WIN - set window size
The packet# field contains the window size desired for packets sent
through this connection to the sender of this packet, i.e. in the reverse
direction from the direction the WIN was sent. Since this is an
unacknowledged packet, the packet# field can be usurped from its normal
use in this way.
The window size is the maximum number of outstanding unacknowledged data
packets which the sender may have at any one time. If the sending user
process tries to transmit additional data packets on this connection, it
should be made to wait until some packets have been acknowledged.
The intention of the window size is to regulate how often "acknowledges"
must be returned. It's the same as in DSP and TCP. This is explained in
more detail below.
No sender is actually required to pay any attention to the window size.
No receiver is actually required to set the window size to something
reasonable. However, those hosts that want to maximize performance should
do something about the window size. The size is initially set during the
RFC/OPN dialogue, presumably according to the type of protocol being used.
An NCP may, if it chooses, use the WIN packet to dynamically adjust the
window size according to observed network behavior. (See dynamic
window-size adjustment section below.)
10 WTS - window too small
The sender sends a WTS packet if it finds its buffer full, a situation
which indicates that either the window size is too small or the sender is
faster than the receiver. See the flow control section. The rate of
transmission of WTS packets has to be strongly limited, since the
condition which causes them will not go away right away and may not go
away at all if the receiver is slower than the sender.
11 LOS - you are losing
If a host receives a packet for a connection that does not exist (other
than RFC which isn't associated with a particular connection, LOS, and CLS
which is safe to ignore), it should interchange source and destination,
change the opcode to LOS, insert an ascii explanation of the problem in
the data field, and send the packet back. A host receiving a LOS should
break the connection specified by the destination index and inform the
associated process that something has gone wrong. It should make the LOS
packet available to that process so the explanation of the problem can be
read.
The LOS packet isn't actually necessary, since if a packet is
re-transmitted many times without being acknowledged, the NCP should give
up, close the connection, and inform the user that the foreign host
appears to be dead.
For debugging, an echo feature is implemented as follows. If you send a
packet with a data opcode and source and destination indices of 0, it
should be sent back as a LOS packet. The data field will be destroyed by
the error explanation, but the packet # and ack packet # fields can be
used to remember any sequencing or checking information.
12 LSN - listen (never transmitted through the net, see below)
>>Table of packet field use vs opcode
The unused field is never used and must be zero, the forwarding count
field is always used in the same way, and the nbytes field is always the
length of the data. Fields marked "0" below are don't care rather than
must be zero, but zero is certainly recommended.
Destination Source
Opcode Host Index Host Index Packet# Ack pk# Data
------ ---- ----- ---- ----- ------- ------- ----
RFC usual 0 usual usual first - 1 window size contact name
OPN usual usual usual usual first - 1 window size 0
CLS usual usual usual 0 0 0 reason
ANS usual usual usual 0 0 0 answer
FWD usual usual usual 0 0 0 contact name
NOP usual usual usual usual 0 usual 0
WIN usual usual usual usual window size usual 0
WTS usual usual usual usual 0 usual 0
LOS src h src i dst h dst i pk# ack pk # reason
LSN 0 0 0 0 0 0 contact name
Data usual usual usual usual usual usual data
>>Flow and Error Control
The NCPs conspire to ensure that data packets are sent from user to user
with no duplications, omissions, or changes of order.
Each receiver (each end of each connection is a receiver, and also a
sender; think of receivers and senders as little actors inside the NCP)
has a list of buffers containing packets which have been successfully
received and are waiting to be read by the user process, and two packet#
variables. One is the number of the last packet read by the user process.
The other is the number of the last packet which has been acknowledged.
If these two are not equal, the receiver needs to send an acknowledgement
"soon."
The received-packet list needs to be kept sorted by packet number, and the
NCP has to make sure that the user process does not see duplicates or
omissions. If packets arrive out of order, the NCP has to sort them.
This means that the user process may not be able to read a packet even
though the receive list is non-empty, because the first packet in the
receive list is not the successor of the last packet read by the user
process.
A good way to do this is to have 2 lists of received packets, each of
which is sorted. The first list contains those packets which the user
process may read; the first in the list is the successor to the last
packet read, and there are no gaps in the list. The second list is the
remaining packets, which the user may not read until some other packet
arrives to fill in the gap. Each list needs a pointer to head and tail,
and the packet number of the next packet to be appended to the tail.
It is not actually a requirement that an NCP support out-of-order packets,
rather than simply discarding them and hoping that they will be
retransmitted in order, but if it's not very hard to do one might as well
do it.
Acknowledgements are sent by putting the last-received variable into the
"ack packet #" field of an outgoing packet on the opposite direction of
the appropriate connection, and copying the last-received variable into
the last-acknowledged variable. Where does the outgoing packet come from?
First of all, all outgoing data, NOP, WIN, and WTS packets automatically
carry acknowledgement for the reverse direction of their connection. So
if an outgoing packet happens to be sent at a time when an acknowledgement
is necssary, that takes care of it.
Secondly, if the number of outstanding unacknowledged packets is more than
1/2 the window size, a NOP should be generated and sent immediately to
acknowledge those packets before the sender fills up the window.
Thirdly, the "soon" of four paragraphs back is implemented by a timeout in
the NCP. If an acknowledgement remains required for a certain amount of
time, a NOP should be generated and sent to carry it. The appropriate
time interval is 1 second, I would guess. This timeout does not have to
be too exact, however.
The reason for having a timeout here, rather than just sending an
acknowledgement right away, is two-fold. It allows "batching" of
acknowledgements, where a single packet can be used to acknowledge many
packets, which halves the network traffic caused by bulk data transfer.
It also allows "piggy-backing" of acknowledgements on data packets, which
(for instance) decreases the network traffic caused by remote-echoing
telnet connections.
There is also something called a "null acknowledgement", which is an
acknowledgement which does not acknowledge any packets which weren't
acknowledged already, carried on a NOP. This sounds useless, but actually
it serves a purpose. A sender decides that the destination host is down
if it has been waiting for an acknowledgement for a long time (30
seconds?) and hasn't received one. If the receiving host is up, but the
receiver process is stopped or not asking for any more network input
(perhaps it is being interactively debugged) the sender should not decide
it is down. So a receiving NCP needs to recognize this case and send a
null acknowledgement. If the received-packet list for a connection
remains non-empty for a certain time (10 seconds?), the NCP should make
the connection require an acknowledgement by changing the
last-acknowledged variable to be one less than (module 2ā16) the
last-received variable, i.e. should send a null acknowledgement.
When a receiver receives a data packet, it compares the packet # of that
packet with the last-received variable. If it is less or equal (modulo
2ā16), it is a duplicate of a packet already given to the user, and should
be discarded (or it is at least 30000 packets early, which is unlikely.)
If it is greater, it is sorted into the received-packet list at the
appropriate position (if it has the same packet# as a packet already in
that list, it is a duplicate and is discarded.) It is NOT acknowledged at
this time; no packet is ever acknowledged until it has been given to the
user process ("end to end acknowledgement"). Since a packet on the
received packet list has not yet been acknowledged, it may be safely
discarded at any time if the operating system runs out of buffer space.
Also, if the receiving user process is not listening to the net, the NCP
cannot be swamped with arbitrary numbers of packets, since the sending
user process is not supposed to send more than window-size packets past
the last one the receiving user process read.
Note that the most likely cause of packet duplication is that an
acknowledge was lost, so whenever a duplicate packet is discarded, the
last-acknowledged variable should be set to one less than the
last-received variable so that a new acknowledge will be sent soon.
The sender has a list of packets which have been entrusted to it by the
user for transmission and one packet # variable, the number of the last
packet sent by the user. When the user next sends a packet, the sender
will increment this variable and set the packet# of the sent packet to the
result. The sender also sets the source and destination host numbers and
indices of the packet, sets the "ack packet #" to the last-received
variable of its corresponding receiver, sets the receiver's last-
acknowledged variable to that, chooses a transmission medium by checking
the routine tables, and gives the packet to the transmission medium for
"immediate" transmission (perhaps it has to wait its turn in a queue.) It
also saves the packet on a list, in case retransmission is required.
With each buffered packet the sender holds in trust, it remembers the time
that packet was last transmitted. From time to time "retransmission"
occurs. The sender gives one or more packets from its list to the
transmission medium. It always starts with the oldest, so as to keep
things in order, and sends the rest in order until it gets to one that was
transmitted too recently to do again. Retransmission is used to recover
from lost or damaged packets, lost or damaged acknowledgements, and
packets discarded by the receiver due to lack of buffering capacity.
Each time a receiver receives a packet, it gives the "ack packet #" from
that packet to its corresponding sender. The sender discards any packets
with numbers less than or equal to that, since their successful receipt
has just been acknowledged.
A transmitter keeps re-transmitting packets until they get acknowledged.
Since the acknowledge is from the ultimate receiver to the original
sender, packets will eventually get sent correctly even if some link
somewhere in the network occasionally loses packets. This end to end
acknowledgement also allows flow control to be done by ignoring packets
for which the receiver does not have buffer space, which is considerably
simpler than "allocation" schemes, and allows a higher rate of through-put
[probably].
There are no negative acknowledgements in this scheme. If a packet is
garbled, how do you know who to send the negative acknowledgement to?
(But see the LOS packet.)
>>Dynamic Window-size Adjustment
Permit me to stress that this stuff is optional for small NCPs.
The goals of flow control are:
1. Error recovery.
2. If the receiver is faster than the sender, avoid unnecessary delays in
transmission due to having to wait for an acknowledge or having to wait
for the sender process to wake up.
3. If the sender is faster than the receiver, minimize retransmissions due
to receive buffer overflow.
4. Minimize the number of otherwise-useless packets generated to carry an
acknowledgement or a window-size negotiation, and minimize useless
retransmissions.
Consequences of the goals:
1. All packets will be retransmitted until acknowledged.
2. The sending NCP must buffer several packets, and packets must be
acknowledged in groups, not one-by-one.
3. If the receiver is slow, something must force the sender not to send
packets too fast.
4. The interval between retransmissions should not be too small. It may
be desirable for it to increase if the receiving process is not listening
for some reason.
The window size is the maximum number of packets which may be in the
network at one time (for one direction of one connection). "In the
network" means output by the sending process and not yet input by the
receiving process. (These processes are the entities which determine the
rate, unless they are so fast that the network slows them down.)
The window size is not the number of packets acknowledged at a time; for
best operation the latter must be 1/2 to 1/3 of the former. See below.
If the sending process is slow (and determines the rate), things are
relatively simple. We just have to have a big enough window size and
frequent enough acknowledgement to cover for sending process wakeup
delays.
If things are not limited by the sender, then
Window size
Flow rate = ---------------
Round trip time
Round trip time = time to wake up sender process (multiplied
by the fraction of the time this
is necessary)
+ time packet is sitting in sender buffers
before it is transmitted
+ transit time through the net
+ time packet is sitting in receiver buffers
before it is read; this is the maximum
of time to process previous packets
and time to wakeup sender process
+ time until acknowledge is generated and sent
+ transit time through the net
The round trip time is the time between when packet N is output by the
sending process and when it is acknowledged, permitting the sending
process to output packet N+WS.
The main variable components of the round trip time are the delay before
acknowledgement and the delay waiting in the receiver buffer for packets
to be processed. If these were zero, the round trip time would consist of
two process wakeups and two network transit times (determined by the delay
waiting for the cable and waiting for previous packets from this host to
be transmitted, the time needed to load and unload the interface in the
buffer, and the actual transmission time, multiplied by the number of
bridges in the path.)
This ideal round trip time is probably on the order of 2 seconds. The
timeout for retransmission should be 2 to 3 times the round trip time.
The timeout for acknowledgement should be 1/3 to 1/2 the round trip time.
One could either measure the actual round trip time, or use an estimate of
say 3 seconds, a little higher than the ideal. It would be a good idea to
measure the round trip time in any case, which is done by getting the
elapsed time since transmission when a packet is discarded due to its
being acknowledged, and averaging that.
The receiver process should initially set the window size to the maximum
flow rate it wants to handle times the desirable round trip time.
Symptoms of improper window size:
If the window-size is too large, the round trip time becomes long due to
packet processing delay in the receiver buffer. (There will be many
packets in the receiver buffer, and the receiver will be processing them
slowly.) The long round-trip time will cause unnecessary retransmissions.
Retransmissions could also be caused by the NCP's discarding received
packets due to insufficient core to buffer them.
If the window-size is too small, excessive process blocking and waking up
occurs. The receiver process often empties its buffer and has to block
until more packets arrive. The sender process often fills up its buffer
and has to block until some of the buffered packets are acknowledged. A
small window size also causes acknowledgements to have to be sent more
frequently than necessary. Note that from the receiver's point of view it
is impossible to distinguish between the window size being too small and
the sending process being too slow.
Here is a scheme for dynamic adjustment of the window size:
Note that window-size adjustments cannot take effect (in the sense of
fixing the symptoms) immediately, so it is necessary to limit the rate at
which the window size is adjusted.
When the receiver receives (and discards) a duplicate of a packet it
already has in its buffer, this indicates either that an acknowledgement
was lost or that the window size is too large. Since packets are assumed
not be lost very often, we may as well assume the window size is too large
and send a WIN packet to decrease it. Another possibility would be to
have the sender detect the long round-trip time and complain to the
receiver, who could adjust the window size. The receiver must not
decrease the window size again until all packets currently buffered have
been read and acknowledged, indicating that the sender has had a chance to
decrease the number of packets buffered at its end. A reasonable amount
to decrease the window size by is 1/3 of its current value.
When the sending process wants to output a packet, but the number of
packets already buffered is greater than or equal to the window size, it
should send a WTS, indicating that the problem is too small a window size
or too slow a receiver rather than too slow a sender. When the receiving
process wants to input a packet, but the buffer is empty, and a flag is
set indicating that a WTS has been received, it should send a WIN packet
adjusting the window size upward by 1/2 of its current value (and clear
the WTS-received flag). This is rate-limited by preventing the sender
from sending a second WTS until all the packets buffered at the time the
first WTS was sent have been acknowledged, indicating that the receiver
has had time to act on the first WTS.
The variables required. For both the sending and receiving sides, a
packet number which has to be acknowledged before WTS or WIN can be sent
again, and a flag saying whether this number is operative. Also, a
WTS-received flag in the receiver.
It is important to meter the performance of this mechanism and find out
whether it does anything and whether what it does is right.
Consider the possibilities of changing this into a more symmetric and
negotiation-based scheme, where the sender always initiates window size
changing and the receiver either agrees or ignores the request. Consider
using elapsed time as an additional rate-limiter (have to use the other
thing, too, so idle connections don't keep changing window size; this may
be deleteable if it is always sender-initiated.)
More notes on the subject of window-size too small. This is identical to
receiver too slow. The net flow rate out of the sender is trying to be
higher than that into the receiver, so packets pile up in buffers at each
end. The round-trip becomes arbitrarily high to preserve the equation and
divide window size down enough to get the flow rate.
The situation where the window-size is too small and we want to do
something about it has to be distinguished from two other situations.
One, the receiver is accepting packets slowly but the sender is also
sending them slowly. We don't want to change the window-size, because it
doesn't matter since packets aren't piling up, and at any time they might
both decide to go fast. Two, the receiver's net flow rate is high, but
its response time is long (it's taking packets in bursts). Here the
round-trip time is still long, but making the window size smaller would
make things worse.
The symptoms that distinguish the case where we want to make the
window-size smaller are: the round-trip time is long, the sender buffer
is full, and the number of packets acknowledged at a time is small
compared to the window size. Actually, the last two are sufficient, since
if the acknowledgement batch size is small, and we know it's not the
sender's fault, may as well decrease the window size to save buffer space
and decrease the round-trip time.
>>Uniquization
To avoid problems with packets left over from old connections causing
problems with new connections, we do two things. First of all, packets
are not accepted as input unless the source and destination hosts and
indices correspond to a known, existent connection. By itself, this
should be adequate, provided that retransmission is only done by the
originating host, not by intervening gateways and bridges in the network.
This is because we can safely assume that when a host agrees to open a
connection with a certain index number at its end, it will give up on any
previous connection with the same index, therefore it won't retransmit any
old packets with that index once it has sent out a new RFC or OPN. The
indications are that our network will be "local" enough that indeed
retransmission will only be done by the original host.
Problems could still occur if packets get out of order, so that an OPN
establishing a new connection gets ahead of a data packet for an old
connection with the same index. To protect against this, it is necessary
to assure that at least a few seconds elapse before an index number is
reused. This could be done either by remembering when an index is last
used, or by reserving part of the 16-bit index number as a uniquization
field, which is incremented each time an otherwise-the-same index is
reused. This field needs to big enough to cover for the maximum delay of
an old data packet with the same index, and depends on the rate of
creation of connections. Which method is chosen is at the discretion of
each local NCP. Another necessary assumption is that when a system
crashes and is reloaded (thus forgetting any remembered information about
which indices were in use when and so forth) that the time to reload it is
more than a few seconds.
Problems could occur not only with left over data packets, but also with
left over control packets. This isn't too much of a problem since control
packets are not retransmitted, but it could still happen that a host gets
faked out into thinking that it has a connection to another host that the
other host doesn't know about. In this case, it should just look like the
connection was opened and then either the other host went down or the
connection was broken by a LOS packet, since the other host won't generate
any data packets and won't accept any.
>>Media handlers
A host may be connected to more than one transmission medium. It has
service programs for each.
When a packet is received that is not addressed to this host, the
forwarding count should be incremented. If it doesn't overflow, the
packet should be sent back out according to the routing tables, otherwise
it should be discarded. Normally it would not be useful to send a packet
back out on the same subnet it came in on, but we may as well let the
forwarding count catch this along with other loops.
When a packet is received, if the opcode is RFC, it is handled specially.
The contact name is compared against those of all the indices which are in
the Listening state. If a match is found, that index is put into the
RFC-received state, its LSN packet is discarded, and the RFC packet is put
into its input list so that the server process can see it. If no server
is listening to that contact name, the RFC packet is placed on the
pending-RFC list, and (in the case of ITS) a server process is created
which will load itself with a suitable program to open an index in
"server" mode, gobble an RFC packet, look at the contact name, and either
reload itself with the appropriate server program or send a CLS reply.
When a non-RFC packet is received, the system must look for a receiver
index to handle it. If none is found, or the state is wrong, or the
source host and index don't match, a LOS should be sent unless the
received packet was a LOS. Otherwise, if the received packet is WIN, WTS,
or NOP, it is processed and discarded. Other packets are given to the
user; OPN, CLS, and LOS cause a state change but are also given to the
user as input.
The transmitting side of a transmission medium handler has a queue of
packets to be transmitted. It should send them out, in order, as fast as
possible, except that if a receiving host has no buffer space (which can
be detected because its chaosnet interface will cause "interference" on
the ether), it should look down the list for another host to send to. As
long as packets to the same host are sent in the order they are queued,
everything will be all right. (Actually, this normally shouldn't matter
much.) In addition, when the packets are put into the transmit queue, the
destination host number has to be looked up in a table to determine which
transmission medium to use to get to it and (in the case of ether) which
physical host number to put in the packet trailer for the hardware.
>>Buffers
In ITS, the buffering scheme will be as follows. There will be a pool of
32-word packet buffers available. When it runs out, more can be made.
When there are many free some can be flushed. 32-word buffers are made
out of 1024-word pages (adding a new page type), rather than using the
existing 128-word buffer mechanism, because there is a limited number of
128-word buffers, and that mechanism is a little painful to use. There
are likely to be several hundred packet buffers (say 12K of core) in use
when high-speed (e.g. mag-tape speed) file transfer is going on.
Each packet buffer has a two-word header, and 30 words which can hold a
packet. Packet buffers can be on one (or sometimes two) of six lists:
The free list. The receive list, of which there are two for each index,
one for packets which the user may safely read and one for out-of-order
packets which are awaiting the arrival of an earlier packet before the
user may read them. The send list, of which there is one for each index.
The transmission list. The pending-RFC list. The pending-LSN list.
The free list contains packet buffers which are free. They are threaded
together through addresses in the first word. Zero ends the list.
The transmission list contains packets which are to be transmitted out on
the network "immediately". At interrupt level packets are pulled off of
this list and sent. (There might be more than one transmission list if a
machine is connected to more than one physical medium.) The transmission
list is threaded through addresses in the left half of the first word.
Zero ends the list. After transmission -1 is stored to indicate that the
packet is not on the transmission list any more. If the right half of the
first word is -1, indicating that the packet is not also on a send list,
it is returned to free.
Each send list contains packets for a particular connection which have
been entrusted to the system by the user to be sent, but have not yet been
acknowledged. They are threaded together through the right half of the
first word. The second word contains the time that the packet was last
transmitted (actually, the time that it was last put on the transmission
list.)
Each receive list contains packets which have been received on a
particular connection and not yet read by the user. They are threaded
together by addresses in the first word, and the list ends with zero. The
receive lists must be kept sorted by packet number.
The pending-RFC list contains request-for-connection packets which have
not yet been either accepted or rejected. They are threaded together
through the first word. When a server process finishes getting created
and loaded, it will take an RFC off the pending-RFC list and put it on its
own receive list. The second word of these packets contains the time
received so that the system can know when something has gone wrong and
they should be thrown away.
The pending-LSN list contains LSN packets for all the listening users.
These packets are just used as a handy place to save the contact name
being listened to. They are threaded together through the first word.
The source-index field in the packet header can, of course, be used to
find which user this packet belongs to.
>>ITS System Calls
(Other systems would have similar calls, with appropriate changes for
their own ways of doing things.)
OPEN
Not allowed. (I said this wasn't a "standard" device!)
Instead use:
CHAOSO
arg 1 - receive channel number
arg 2 - transmit channel number
First, the two specified channels are closed. Then an index
is assigned to the user and the two channels are set up to
point to it. Two channels are used since in general ITS
channels are unidirectional, and to allow to the user to
handle receive and transmit interrupts differently.
The created index is placed in the Closed state. To set up
a connection, IOT an RFC or LSN packet down the transmit
channel.
IOT
Always transfers exactly one packet. The effective
address of .IOT is the address of a 30.-word block
which contains the packet. .CALL IOT should be given
an immediate second argument which is the address of
the 30.-word block. .CALL SIOT is not allowed.
The format of the 30.-word block is:
16 16 4
-----------------------------------------
| opcode | unused | fc | nbytes | 0 |
-----------------------------------------
|destination host |destination index| 0 |
-----------------------------------------
| source host | source index | 0 |
-----------------------------------------
| packet # | ack packet # | 0 |
-----------------------------------------
| data1 | data2 ...
... data104 |
-----------------------------------------
In the descriptions below, if an error is said to
occur that means IOC error 10. (channel in illegal mode)
is signalled.
In the case of an output IOT, the user sets only
the opcode, nbytes, and data-n fields. When the
NCP copies the packet into a buffer in the system
it sets the other fields of the header to the
appropriate values.
This is not completely true. When outputting an RFC,
the user sets the destination host field, and sets the
ack packet # to the receive window size desired. The user
also sets the window size when outputting an OPN.
The NCP checks for the following special values
in the opcode field of a packet output by the user:
RFC - error if the index is not in the Closed state.
The packet is transmitted (but not queued for
possible retransmission) and the index enters
the RFC-sent state. The user should do an input
IOT which will wait for the OPN, CLS, FWD, or ANS reply
packet to arrive. The NCP also copies and saves
the user-specified host number and window size.
LSN - error if the index is not in the Closed state.
It is put into the Listen state. The packet
is not transmitted, but it is saved so that
when an RFC comes in the system can compare
the contact names. (Note- LSN is a special
opcode which is never actually transmitted
through the net.) The pending-RFC list is searched
to see if an RFC with the same contact name has
been received. If so, it is given to this index
as if it was received just after the LSN was
sent out.
OPN - error if the connection is not in the RFC-received
state. It is put into the Open state. The
packet is transmitted (but not queued for
retransmission, since until it is received
the other end does not know what index to
send acknowledgements to.) The system also
copies and remembers the window size.
CLS - error if the connection is not in the Open
or the RFC-received state. It is put into
the Closed state and the packet is transmitted
(but not queued for retransmission). This packet
may optionally contain data bytes which are
an ascii excuse for the close.
FWD - error if the connection is not in the RFC-received
state. The packet is transmitted, but not queued
for retransmission, and the connection is put into
the Closed state.
ANS - error if the connection is not in the RFC-received
state. The packet is transmitted, but not queued
for retransmission, and the connection is put into
the Closed state.
200 or higher - This is a data packet. Error if the
connection is not in the Open state. A packet#
is assigned, the destination and source fields
are filled in, and the packet is transmitted and
queued for retransmission.
Any other opcode causes an error.
In the case of an input IOT, the user will get an error
if the connection is in the Closed or Broken state,
except if it is in the Closed state and there are data
packets queued. This is so that the user can read the
CLS packet. Otherwise, it will hang until a packet
arrives, then return the packet into the user's
30.-word block.
The user should check the sign bit of the first word,
which will be set if this is a data packet. The
non-data packets which can get given to the user are
RFC, OPN, FWD, ANS, LOS, and CLS. (You shouldn't be
surprised if you get something else, though!)
CLOSE
Immediately closes the connection. All buffers and other
information associated with the index are discarded. Normally
the user should first IOT a CLS
packet containing an ascii explanation for why it is
closing. Note that any data previously written on the
connection but not yet received by the other end will be
lost. User programs should exchange "bye" commands of some
sort before closing if they care about losing data. It is
done this way to keep the NCP simple.
RESET
Does nothing.
FORCE
Does nothing.
FLUSH
On an output channel, does FORCE and then waits until
there are no queued output buffers. I.e., waits for
all output to be received and acknowledged by the foreign
host.
RCHST
val 1 SIXBIT/CHAOS/
val 2 0
val 3 0
val 4 0
val 5 -1
RFNAME
val 1 SIXBIT/CHAOS/
val 2 0
val 3 0
val 4 0
val 5 0 or 1 ?
WHYINT
val 1 - %WYCHA
val 2 - state
val 3 - number of packets queued (receive,,transmit)
val 4 - window size (receive,,transmit)
LH(val 3) is the number of packets available to input IOT.
This is different from the number of received packets
if some are out of order.
RH(val 3) is the number of packets which have been transmitted
by output IOT but which have not yet been received and
acknowledged by the foreign host.
The state codes are:
%CSCLS Closed
%CSLSN Listen
%CSRFC RFC-received
%CSRFS RFC-sent
%CSOPN Open
%CSLOS Broken by receipt of "LOS" packet.
%CSINC Broken by incomplete transmission (no acknowledge
for a long time)
NETBLK
Similar to Arpanet NETBLK.
STYNET
This should work the same as on the Arpanet. It will
not be implemented initially, however.
CHAOSQ
arg 1 - address of a 30.-word block (packet buffer)
This is a special system call for use by the ATSIGN CHAOS
program, which is a daemon program that gets run when
an RFC is received that does not match up against an
existing LSN.
The first packet on the pending-RFC queue is copied
into the packet buffer, then moved to the end of the
queue (so that the right thing happens when several
RFC's are pending at the same time.)
The call fails if the pending-RFC queue is empty.
The program should use the contact name in this
packet to choose a server program to execute. This
server program will then LSN to (presumably) the same
contact name, thus picking up the RFC.
Interrupts
IOC error interrupts occur if an attempt is made to IOT
when the connection is in an improper state, or to IOT
a packet with an illegal opcode.
An I/O channel interrupt is signalled on the input channel
when the number of queued buffers changes from zero to
non-zero.
An I/O channel interrupt is signalled on the output channel
when the number of queued buffers changes from greater or
equal to the window size, to less than the window size.
An I/O channel interrupt is signalled on the input channel
when the connection state changes.
Interrupts can be used for
(1) detecting when input arrives.
(2) detecting when the system is willing to accept
output.
(3) detecting when the other end does a CLOSE.
(4) detecting when a requested connection
is accepted or rejected.
(5) detecting when a request for connection
comes into a listening server.
---------- Material after this point may be inoperative ----------
>Comparison with LCSnet
, and other blathering.
>>Principle differences
The LCSnet proposed protocol is called DSP. The Chaosnet protocol will
just be called chaos in this section.
(1) DSP specifies things in terms of bytes where Chaosnet specifies them
in terms of packets. We choose packets to increase the simplicity and
efficiency of the scheme. DSP has to work in terms of bytes because it
allows packets to be reformatted en route, hence
(2) DSP assumes that gateways can exist between networks with the same
protocols but different packet sizes. Therefore, the protocol has to
allow for the fact that packets may be reformatted en route. I happen to
believe that this situation is extremely unlikely to exist, and in fact
gateways between "different" networks will have to do much more than just
change the packet size. Therefore, it makes sense to make the gateway
worry about gateway issues, rather than have them permeate the whole
protocol. I believe that gateways will look more like regular network
ports than like transmission media; to get from a host on the local net to
a host on the arpa net, one will connect to the arpa net gateway and ask
it to open a connection from itself to the host on the arpa net, then tie
those connections together. A gateway will perform not only packet
reformatting, but protocol translation, flow control on both sides, and
maybe even character set translation. There can also be entities called
"bridges", which connect two networks (or two separate segments of one
network) with the same protocol. A bridge simply forwards any packets it
receives, but never alters the packets, and never looks inside them except
to find out where to forward them to.
(3) A related difference is that DSP includes the arpa net, and TCP, and
by extension all the networks in the universe, in its port number address
space. Chaosnet would have you connect to a gateway, then send the
gateway the port number to connect to in the foreign address space
separately. Chaosnet does have to include all networks reachable by
bridges in its address space.
(4) Chaosnet has an "opcode" field in the packet header, where DSP does
not. DSP acheives the same effect with various bits here and there. It
makes little difference unless user-level programs decide to exploit the
opcode field.
(5) DSP and Chaosnet have quite different mechanisms for creating
connections. In DSP, one never creates a connection, exactly; one simply
starts sending to a port address. Local network note #3 mumbles about how
one discovers which port address to send to, but I have not seen any
specifics. In Chaosnet, the mechanism for finding out where to send to
and the mechanism for creating a connection are intertwined; the excuse is
that often the process being connected to is created at the same time as
the connection.
(6) DSP uses unique, never-reused port IDs. Chaosnet does not. The
problem with unique, never-reused IDs is that I know of no system that can
implement them. Multics comes close, with the aid of a special hardware
clock. The clock is set from the operator's watch when the system is
powered on, and the mechanism depends on the fact that the error in the
operator's watch is less then the time required to bring up the system
after a power failure. Small systems that cannot afford special hardware
just for this, and don't have permanent disk storage, would find it very
hard to generate unique IDs.
Chaosnet prefers to rely on a scheme that doesn't require special
hardware, but nearly always works. By requiring a connection to be opened
before data packets can be sent through it, and by some assumptions about
the structure of the network, the problem is eliminated. See the Flow and
Error Control section for further discussion.
(7) DSP closes the two directions of a connection separately. Why?
>>High priority data packets, interrupts, and flushing.
The basic idea is to note that if you want to send a high priority
message, this means you want it out of order with respect to previously-
sent data on some connection. Therefore, high priority data should be
sent over an auxiliary connection. The per-connection overhead is not
prohibitively high, and this eliminates vast quantities of hair from the
innermost portion of the system.
One advantage that DSP gains by having "high priority messages" built into
the system is that it also incorporates a standardized way to "mark" a
particular point in a data stream. However, this is comparatively
unimportant, particularly since I think high-priority messages will
probably never get used. The only place I've heard them proposed to be
used is with Telnet, but ordinary terminals get along quite well without
"out of band" signals when used with reasonable operating system software.
Interrupts and flushing of input are random crocks associated with high
priority messages. I don't propose to implement them either.
>>Datagrams. (connections only used to pass a single packet.)
These are easy. The guy who wishes to send a datagram does OPEN, IOTs an
RFC to the service to which the gram is to be sent, and NETBLKs waiting
for the connection to open up. He then IOTs the data packet, FLUSHes
waiting for it to get there, then CLOSEs.
The server OPENs and IOTs an OPN in response to the RFC. She then IOTs in
the datagram packet, CLOSEs, and goes off processing the message.
Four packets are transmitted, two in each direction. (An RFC, an OPN, a
DATA, and an ACKing NOP.) No need to send any CLS messages, since each
user process knows to do a CLOSE system call after one data packet has
been transmitted. It has been claimed that this is the theoretical
minimum if acknowledgement is required. The reason is that the data
packet must contain some unique id generated by the RECEIVER to avoid
duplicates, and it must be acknowledged, so that's two packets in each
direction, with no combining possible.
Note that as [someone at] PARC has pointed out, for the important case of
side-effect-free transactions, a timeout can substitute for
acknowledgement, and only two packets are necessary. See ANS.
>>Why not multiple messages per packet?
[1] Not needed for data. The division of the data stream into packets is
invisble to the real user, anyway. It's only used by the "user-ring"
portion of the network system software.
[2] Extra complexity. Consider the hair involved with packed control
messages in the Arpanet. Because of the control link being shared between
multiple connections between the same pair of hosts, this could save a
little. I don't know of any NCP that does this; furthermore, having that
shared facility is a bad idea. The only case in the Arpanet where packing
two control messages into one packet is useful is when opening a
connection the receiver wants to send STR and ALL both. In this protocol
we just put the window size in as part of the RFC and OPN messages.
[3] There is an argument that having message boundaries separate from
packet boundaries is useful because gateways between different networks
may need to split up packets because the two networks may have different
maximum packet sizes. My feeling about this is that the gateway is likely
to have to do a good deal more than that. It seems like too much to wish
for that the two networks should use exactly the same packet format,
protocols, or even character set; so the gateway rather than being a
packet-reformatting device is much more likely to look like a user program
with two connections, one on one network and one on the other, which
passes data between the connections with appropriate conversion. In
particular, flow control is likely to be host1 to gateway and host2 to
gateway, rather than host1 to host2.
>>Why not have a checksum in the packet?
This network is likely to have a very diverse collection of machines on
it, which means it will be impossible to define a checksum which can be
computed efficiently in software on all machines. Now all hardware links
in the system ought to have whatever amount of hardware checking is
appropriate to them, but due to the efficiency costs of a user-level end
to end checksum, this should not be a built-in requirement of the basic
low-level protocol. Instead, checksumming should be an optional feature
which some higher-level protocols (those that need it because the data
being passed through them is so vitally important that every possible
effort must be made to ensure its correctness) may implement.
Checksumming should be implemented at the user level in exactly the same
way and for exactly the same reasons as encryption should be implemented
at the user level.
>>How about host-independent user-level protocols, where one just connects
to a service and doesn't have to know what host it's at today?
Yeah, how about it? As far as I know, this protocol provides an adequate
base for constructing such a thing. Also I haven't seen anything
published on the subject.
>>Very small hosts.
E.g. we'd like to put the Chess machine on the net. It has very little
memory, but not totally impotent microcode. A small host need only
support one connection, may ignore WIN, LOS, and CLS, may only have one
packet in transmission at a time, and may process receive packets one at a
time (ignoring any that come in until the first one has been fully
processed). It IS necessary to check that received DATA packets come in
in the right order.
RFC may be handled by remembering the other guy's host number and index,
and sending back a completely canned OPN. The contact name is ignored.
If a second user tries to connect while a first user is connected, the
first user gets bumped. Let them fight it out on some larger machine (or
the floor) for who will get to use the small machine. Never originate any
packet type other than DATA and that one OPN.
Attaching ordinary terminals "directly" to the network is obviously
undesirable.
>Transmission Media
This section describes how packets are encapsulated for transmission
through various media, and what auxiliary hair is needed by each medium.
>>Ethernet
The messages transmitted through the ether (or CAIOS) net consist of a
packet followed by a three-word trailer:
+----------------+
| header | 8 words
+----------------+
| data | 0-52 words
+----------------+
| immediate dest | 1 word
+----------------+
| immediate src | 1 word
+----------------+
| CRC check word | 1 word
+----------------+
The three trailer words are looked at by the hardware; the last two of
them are supplied by the hardware. The reason this stuff is in a trailer
rather than a leader is that the caiosnet hardware actually transmits the
packet backwards. However, this is transparent to the software.
Bytes are sent two per word. The low-order byte is first (pdp11
standard).
>>TEN11 Interface
[The following total hair has not been checked recently.]
Since interrupts can't be sent through the TEN11 interface, the pdp10 can
only service the network at relatively-infrequent intervals, for instance
every 1/60'th of a second. Therefore it is necessary to have queues of
packet buffers in each direction. This provides high speed by allowing
several packets to be processed at a time.
The speed and reliability of the TEN11 interface eliminates any need for
error checking. (ha ha) [ho ho] <he he> To decrease the load on the AI
pdp10, it is assumed that the pdp11's will be responsible for swapping the
bytes in the data portions of packets so that they will be in pdp10
standard order.
Even though the contents of packets will not be error-checked, the pdp10
must check addresses to avoid being screwed by a losing pdp11.
The form of a packet encapsulated for the TEN11 interface will then be
|-----------------|-----------------|----|
| queue thread | 0=empty, 1=full | 0 |
|-----------------|-----------------|----|
| #bytes | opcode | unused | 0 |
|-----------------|-----------------|----|
|destination host |destination index| 0 |
|-----------------|-----------------|----|
| source host | source index | 0 |
|-----------------|-----------------|----|
| packet # | ack packet # | 0 |
|-----------------|-----------------|----|
| data 0 | data 1 | data 2 . . . | 0 |
| | 0 |
|-----------------|-----------------|----|
for a total of 31 36-bit words, or 62 pdp11 words.
The queue thread is the pdp11 address of the next packet-buffer in a
queue, or zero if this is the last. The empty/full indicator says whether
this buffer currently contains a packet, or not.
The following is an attempt to express the algorithms of the pdp10 and
pdp11 in concise form. Hopefully they are self- explanatory.
Several queues of buffers need to exist in the pdp11. Only two of these
are known to the pdp10.
TO10QF - first buffer queued for transmission to the 10.
TO10QL - last buffer queued for transmission to the 10. Exists so that
buffers can be appended to the list more quickly.
TO10AC - first buffer in list of buffers actively being gobbled by the 10.
Set by 11, cleared by 10.
TO10FR - copy of TO10AC. Used to free the buffers after the 10 gobbles
them.
(come-from pdp11-packet-receivers
when (eq (destination-host ?packet) pdp10)
;re-arrange 8-bit bytes for 10
(swap-bytes (data-part packet))
;Add to to-10 queue
(set (thread packet) nil) ;nil=0
(cond ((null TO10QF)
(setq TO10QF packet TO10QL packet))
(t (set (thread TO10QL) packet)
(setq TO10QL packet)))
;Try to activate to-10 queue
(cond ((null TO10FR)
(setq TO10FR TO10QF
TO10AC TO10QF
TO10QF nil
TO10QL nil)))
)
(come-from pdp11-polling-loop
when (and (not (null TO10FR)) ;buffers were sent to 10
(null TO10AC)) ;and 10 has finished gobbling
(mapcar 'make-buffer-free TO10FR) ;mapcar follows queue words
(setq TO10FR nil) ;OK to activate more buffers now
(cond ((not (null TO10QF)) ; more stuff waiting, activate it now
(setq TO10FR TO10QF
TO10AC TO10QF
TO10QF nil
TO10QL nil)))
)
(come-from pdp10-clock-level
when (and (not (null TO10AC)) ;11 is sending buffers
(not web-buffers-locked))
;copy to user, process control message, or reject if buffers full
(mapcar 'process-input
TO10AC)
;signal pdp11 that all packets have been gobbled
(setq TO10AC nil))
FRBFLS - list of free buffers. cons-buffer gets from here,
make-buffer-free puts here.
FM10AC - list of buffers into which pdp10 may place packets
set by 11 / cleared by 10.
FM10GB - copy of FM10AC, used by 11 to process buffers after
10 has placed packets into them.
(come-from pdp11-polling-loop
when (and (null FM10GB) ;10 needs some buffers &
(not (null FRBFLS))) ; free buffers available
;give the 10 a list of a suitable number of empty buffers
(repeat max-at-a-whack times
(and (null FRBFLS) (exitloop))
(setq buffer (cons-buffer)) ;pull off free list
(set (full-indicator buffer) nil) ;contains no packet
(set (thread buffer) FM10GB) ;cons onto list
(setq FM10GB buffer))
(setq FM10AC FM10GB) ;give buffer list to 10
)
(come-from pdp11-polling-loop
when (and (not (null FM10GB)) ;gave 10 some buffers
(null FM10AC)) ;which it has used
;process packets sent from 10.
(mapcar
'(lambda (buffer)
(cond ((null (full-indicator buffer))
(make-buffer-free buffer)) ;didn't get used
(t (swap-bytes buffer)
(send-to-destination buffer))))
FM10GB)
(setq FM10GB nil)) ;no buffers active in 10 now
(come-from pdp10-clock-level
when (and (not (null FM10AC)) ;buffer space avail
(not web-buffers-locked)) ;no M.P. interference
;send as many packets as possible
(mapcar
'(lambda (buffer)
(cond ((needs-to-be-sent ?packet) ;find a sendable packet somewhere
(copy-into buffer packet)
(set (full-indicator buffer) t))))
FM10AC)
;signal pdp11 to gobble the packets
(setq FM10AC nil))
To avoid the need for a gross number of exec pages in the pdp10, the
FM10AC and TO10AC words, and all the packet buffers, should lie in a
single 4K page. The best location for this page varies from machine to
machine. On dedicated 11's such as the AI TV11, the MC console 11, etc.
it should probably just be the first 4K of memory. On the logo machine,
it would probably be best to put this page up in high memory where RUG
won't mess with it. In the case of the mini-robot system, I'm not sure.
It would be best not to try to use this protocol with "general purpose"
machines, because of problems with finding the list headers and packet
buffers, problems with telling whether the machine is running the right
program, etc. It should be used just as a way for the AI machine to get a
path to the net.
>>DL10 & DTE20
[Outline only]
[Just use the pdp11 as a substitute for a direct chaosnet interface.]
[Basically, the 11 says ready to transfer (in either direction), the 10
sets up the pointers and says to transfer, and the 11 transfers the cruft.
To eliminate an extra level of buffering, on input transfers the 11 makes
a copy of the first 4 16-bit words of the header available to the 10 when
it first says "ready to transfer." The 10 uses these to decide where to
copy the packet into. It helps if you don't try to use a DL10 on a
machine with a cache.]
>>Asynchronous line
Packets are encapsulated by preceding them with start of text (202), and
following them with a 1-byte additive checksum and an end of text (203).
The 16-bit words are transmitted low order byte first. If the checksum is
wrong the receiver ignores the packet. The start and end characters are
just there to help in ignoring random noise on the line. If they don't
appear, the packet is ignored. The full 60-word packet is not
transmitted; the #bytes count is used to determine how many of the data
bytes to transmit; the receiver fills the remainder with zero (or garbage,
at its convenience.)
This protocol is intended mainly for communication between the plasma
physics pdp11 in bldg. 38 and a pdp11 in 545, until the caiosnet gets
extended that far (or a longer-distance, lower-speed caiosnet is extended
to various machines off in that direction.)
>Higher-Level Protocols
>>Telnet
Essentially this should follow "New Supdup". [Unresolved issues with
respect to output reset remain.]
>>File-Access
[*****]
>>Mail
[*****]
>>Locate-named-service
[*****]